Extract information from Data
Years of study / salary relation 1. Understand the relation between an “input” and an “output” 2. Find a function that roughly estimates the data points (called regression)
step size = learning rate
The data points (\(x_i, y_i\)) \(\in
\mathbb{R}^2\) are supposed to be of the form \[y_i = f(x_i) + \epsilon \hspace{3mm} ( \epsilon
\rightarrow \text{noise})\] Remark: In general,
the function \(f\) is unknown
We want to approximate this function \(f\) using the data! Find an approximation
\(\hat{f}\) of \(f\) using {(\(x_i, y_i)\)}\(_{1
\leq i \leq N}\)
The nearest neighbours interpolation of {(\(x_i, y_i)\)}\(_{1 \leq i \leq N}\) is the function \[x \Rightarrow \hat{f}(x) = y_{i*}\] This means for an input \(x\) you find the nearest data point \(x_i\) (i.e. the one with the smallest absolute distance to \(x\)) and assign its corresponding value \(y_i\) to \(\hat{f}(x)\)
when \(i_x \in \text{argmin}_{1 \leq i \leq N} |x - x_i|\)
Next up: 1.1 Linear Regression